Background


Data Collection

Plots were sampled at 27 independent sites throughout a subset of the Eastern Highland Rim ecoregion (delineated in tan below). Note that some sites were also in close proximity, making them difficult to see at broader zoom levels.

1031 1m^2 quadrats were sampled with differeing sample sizes per site based on the size of the site.

Data Configuration

Nearly all statistical packages require the data to be in a presence-absence form. There are several ways to do it (one of which can maintain cover values rather than changing it to binary data); I used a loop function. The result is a presence-absence matrix with Site as a column so subsamples can be organized accordingly.

Table 1. Sample of data in presence-absence format.
Site SCICYP PROPEC JUNDEB JUNREP CXJOOR CXLOUI
1.1.1 1 1 1 1 1 1 0
1.1.2 1 1 1 1 0 0 1
1.1.3 1 1 1 0 0 0 1
1.1.4 1 1 1 0 0 1 1
1.1.5 1 0 0 0 0 0 1

Diversity

Sites had widely varying observed total richness. The use of extrapolated species accumulation curves can tell us how many species are likely to be at the site based on how many were found in accumulating subsamples. However if a curve never flattens, it gives a wild estimate of richness (see Site 10).

What do the curves that made these estimates look like? Let’s take a look!

This plot is not particularly helpful other than to visualize the general span of observed and expected richnesses and sampling efforts. Examining the curves in portions of 3-4 is necessary to

These curves illustrate not only where the flattening point (expected richness) occurs, but also how quickly. Examining a curve can allow someone to estimate how many more samples would be needed to reach that point, however if doing so samples a larger area then the curve may never flatten.

Let’s see if sampling effort (# quadrats/area sampled) affected percent estimated sampling completion; if it did, that would be a big problem and I would have a lot of explaining to do to my committee.

There is no relationship between sampling effort and completion percentage (p=0.6366006). However, note that Sites 10 and 26 were flagged as outliers by the autoplot function. This inadequate sampling is likely the result of too few quadrats sampled at the wetland edge relative to the size of the wetland.

Evenness and Dominance


There appears to be a relationship between site richness and site area, but unexpectedly this relationship appears to be negative. Because the data are likely non-linear, a generalized linear model should be used to assess this relationship.

Communities

Which sites are similar?

There may be a lot of overlap in clusters due to the nestedness of some community types. We should revisit this later using quadrats as replicates.

Are sites similar because they’re geographically closer?


Call:
glm(formula = Sorensen ~ geographic, family = binomial, data = distance.f)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.13785  -0.13608   0.02589   0.19654   0.64852  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) 1.427e+00  1.626e-01   8.777   <2e-16 ***
geographic  6.933e-07  1.764e-06   0.393    0.694    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 43.121  on 701  degrees of freedom
Residual deviance: 42.966  on 700  degrees of freedom
AIC: 298.02

Number of Fisher Scoring iterations: 4

Ordinations

Implications

Plant communities aren’t real. We can all go home now.

---
title: "Deterministic processes drive wetland plant community composition and diversity"
author: "C. M. Ciafre"
output:
  html_document:
    df_print: paged
  html_notebook:
    df_print: paged
    highlight: zenburn
    number_sections: no
    rows.print: 10
    theme: journal
---
#   {.tabset .tabset-fade}
```{r setup, NOTICE USE of PACMAN, include=FALSE}
#install pacman first to automatically install and load any needed packages
pacman::p_load(ggplot2, dplyr, tidyr, reshape, iNEXT, knitr, kableExtra, ggfortify, ggpubr, vegan, geosphere, mclust, rgdal, leaflet, ggmap)

#Not sure why I keep this in
knitr::opts_chunk$set(echo = TRUE)

#Load data
sppxsites<-read.csv("data/VEGDATADONE.csv", header=TRUE)
sitenames<-read.csv("data/SiteNames.csv", header=TRUE)
sitesizes<-read.csv("data/SiteXsize2.csv", header=TRUE)
quaddatas<-read.csv("data/quadmetrics.csv", header=TRUE)
colnames(quaddatas)[colnames(quaddatas)=="Depth..m."] <- "Depth"
colnames(quaddatas)[colnames(quaddatas)=="X..Canopy"] <- "Canopy"
colnames(quaddatas)[colnames(quaddatas)=="Pond"] <- "Site"
site_points <- select(sitesizes, c("Site", "Latitude", "Longitude"))
#Note: there is no Site 15; it was ditched halfway through sampling because it was not independent from Site 14.
```

## Background 

<figure>
<img src="Images/QuadratCropped.jpg"></figure><br>

### Data Collection

Plots were sampled at 27 independent sites throughout a subset of the Eastern Highland Rim ecoregion (delineated in tan below). Note that some sites were also in close proximity, making them difficult to see at broader zoom levels.
```{r Static map prep, message=FALSE, warning=FALSE, include=FALSE}
EHRa <- readOGR("MappyBits/a.kml")
EHRb <- readOGR("MappyBits/b.kml")
EHRc <- readOGR("MappyBits/c.kml")
EHRd <- readOGR("MappyBits/d.kml")
EHRe <- readOGR("MappyBits/e.kml")
EHRf <- readOGR("MappyBits/f.kml")
EHRg <- readOGR("MappyBits/g.kml")
EHRh <- readOGR("MappyBits/h.kml")
EHRi <- readOGR("MappyBits/i.kml")
EHRj <- readOGR("MappyBits/j.kml")
EHRk <- readOGR("MappyBits/k.kml")
EHRl <- readOGR("MappyBits/l.kml")

outline_pointsa <- EHRa@polygons[[1]]@Polygons[[1]]@coords
colnames(outline_pointsa) <- c("X","Y")
outlinea <- Polygon(outline_pointsa)
sp_outlinea <- Polygons(list(outlinea),1)
outline_polya <- SpatialPolygons(list(sp_outlinea))
proj4string(outline_polya) <- CRS("+proj=longlat +datum=WGS84 +ellps=WGS84")

outline_pointsb <- EHRb@polygons[[1]]@Polygons[[1]]@coords
colnames(outline_pointsb) <- c("X","Y")
outlineb <- Polygon(outline_pointsb)
sp_outlineb <- Polygons(list(outlineb),1)
outline_polyb <- SpatialPolygons(list(sp_outlineb))
proj4string(outline_polyb) <- CRS("+proj=longlat +datum=WGS84 +ellps=WGS84")

outline_pointsc <- EHRc@polygons[[1]]@Polygons[[1]]@coords
colnames(outline_pointsc) <- c("X","Y")
outlinec <- Polygon(outline_pointsc)
sp_outlinec <- Polygons(list(outlinec),1)
outline_polyc <- SpatialPolygons(list(sp_outlinec))
proj4string(outline_polyc) <- CRS("+proj=longlat +datum=WGS84 +ellps=WGS84")

outline_pointsd <- EHRd@polygons[[1]]@Polygons[[1]]@coords
colnames(outline_pointsd) <- c("X","Y")
outlined <- Polygon(outline_pointsd)
sp_outlined <- Polygons(list(outlined),1)
outline_polyd <- SpatialPolygons(list(sp_outlined))
proj4string(outline_polyd) <- CRS("+proj=longlat +datum=WGS84 +ellps=WGS84")

outline_pointse <- EHRe@polygons[[1]]@Polygons[[1]]@coords
colnames(outline_pointse) <- c("X","Y")
outlinee <- Polygon(outline_pointse)
sp_outlinee <- Polygons(list(outlinee),1)
outline_polye <- SpatialPolygons(list(sp_outlinee))
proj4string(outline_polye) <- CRS("+proj=longlat +datum=WGS84 +ellps=WGS84")

outline_pointsf <- EHRf@polygons[[1]]@Polygons[[1]]@coords
colnames(outline_pointsf) <- c("X","Y")
outlinef <- Polygon(outline_pointsf)
sp_outlinef <- Polygons(list(outlinef),1)
outline_polyf <- SpatialPolygons(list(sp_outlinef))
proj4string(outline_polyf) <- CRS("+proj=longlat +datum=WGS84 +ellps=WGS84")

outline_pointsg <- EHRg@polygons[[1]]@Polygons[[1]]@coords
colnames(outline_pointsg) <- c("X","Y")
outlineg <- Polygon(outline_pointsg)
sp_outlineg <- Polygons(list(outlineg),1)
outline_polyg <- SpatialPolygons(list(sp_outlineg))
proj4string(outline_polyg) <- CRS("+proj=longlat +datum=WGS84 +ellps=WGS84")

outline_pointsh <- EHRh@polygons[[1]]@Polygons[[1]]@coords
colnames(outline_pointsh) <- c("X","Y")
outlineh <- Polygon(outline_pointsh)
sp_outlineh <- Polygons(list(outlineh),1)
outline_polyh <- SpatialPolygons(list(sp_outlineh))
proj4string(outline_polyh) <- CRS("+proj=longlat +datum=WGS84 +ellps=WGS84")

outline_pointsi <- EHRi@polygons[[1]]@Polygons[[1]]@coords
colnames(outline_pointsi) <- c("X","Y")
outlinei <- Polygon(outline_pointsi)
sp_outlinei <- Polygons(list(outlinei),1)
outline_polyi <- SpatialPolygons(list(sp_outlinei))
proj4string(outline_polyi) <- CRS("+proj=longlat +datum=WGS84 +ellps=WGS84")

outline_pointsj <- EHRj@polygons[[1]]@Polygons[[1]]@coords
colnames(outline_pointsj) <- c("X","Y")
outlinej <- Polygon(outline_pointsj)
sp_outlinej <- Polygons(list(outlinej),1)
outline_polyj <- SpatialPolygons(list(sp_outlinej))
proj4string(outline_polyj) <- CRS("+proj=longlat +datum=WGS84 +ellps=WGS84")

outline_pointsk <- EHRk@polygons[[1]]@Polygons[[1]]@coords
colnames(outline_pointsk) <- c("X","Y")
outlinek <- Polygon(outline_pointsk)
sp_outlinek <- Polygons(list(outlinek),1)
outline_polyk <- SpatialPolygons(list(sp_outlinek))
proj4string(outline_polyk) <- CRS("+proj=longlat +datum=WGS84 +ellps=WGS84")

outline_pointsl <- EHRl@polygons[[1]]@Polygons[[1]]@coords
colnames(outline_pointsl) <- c("X","Y")
outlinel <- Polygon(outline_pointsl)
sp_outlinel <- Polygons(list(outlinel),1)
outline_polyl <- SpatialPolygons(list(sp_outlinel))
proj4string(outline_polyl) <- CRS("+proj=longlat +datum=WGS84 +ellps=WGS84")
```


```{r Static map, echo=FALSE, message=FALSE, warning=FALSE}
# state <- map_data("state")
# county <- map_data("county")
# 
# 
# tn <- county %>%
#   filter(region=="tennessee")
# ky <- county %>%
#   filter(region=="kentucky")
# al <- county %>%
#   filter(region=="alabama")


# ggplot() +
#   geom_polygon(data = state, aes(x=long, y = lat, group = group),
#                         fill = "white", color="black") +
#   geom_polygon(data = tn, aes(x=long, y = lat, group = group),
#                         fill = "white", color="black") +
#   geom_polygon(data = ky, aes(x=long, y = lat, group = group),
#                         fill = "white", color="black") +
#   geom_polygon(data = al, aes(x=long, y = lat, group = group),
#                         fill = "white", color="black") +
#   geom_polygon(data=outline_polya, aes(x=outline_polya@polygons[[1]]@Polygons[[1]]@coords[,1],
#                                       y=outline_polya@polygons[[1]]@Polygons[[1]]@coords[,2]),
#                alpha = .8, fill="orange") +
#   geom_polygon(data=outline_polyb, aes(x=outline_polyb@polygons[[1]]@Polygons[[1]]@coords[,1],
#                                       y=outline_polyb@polygons[[1]]@Polygons[[1]]@coords[,2]),
#                alpha = .8, fill="orange") +
#   geom_polygon(data=outline_polyc, aes(x=outline_polyc@polygons[[1]]@Polygons[[1]]@coords[,1],
  #                                     y=outline_polyc@polygons[[1]]@Polygons[[1]]@coords[,2]),
  #              alpha = .8, fill="orange") +
  # geom_polygon(data=outline_polyd, aes(x=outline_polyd@polygons[[1]]@Polygons[[1]]@coords[,1],
  #                                     y=outline_polyd@polygons[[1]]@Polygons[[1]]@coords[,2]),
  #              alpha = .8, fill="orange") +
  # geom_polygon(data=outline_polye, aes(x=outline_polye@polygons[[1]]@Polygons[[1]]@coords[,1],
  #                                     y=outline_polye@polygons[[1]]@Polygons[[1]]@coords[,2]),
  #              alpha = .8, fill="orange") +
  # geom_polygon(data=outline_polyf, aes(x=outline_polyf@polygons[[1]]@Polygons[[1]]@coords[,1],
  #                                     y=outline_polyf@polygons[[1]]@Polygons[[1]]@coords[,2]),
  #              alpha = .8, fill="orange") +
  # geom_polygon(data=outline_polyg, aes(x=outline_polyg@polygons[[1]]@Polygons[[1]]@coords[,1],
  #                                     y=outline_polyg@polygons[[1]]@Polygons[[1]]@coords[,2]),
  #              alpha = .8, fill="orange") +
  # geom_polygon(data=outline_polyh, aes(x=outline_polyh@polygons[[1]]@Polygons[[1]]@coords[,1],
  #                                     y=outline_polyh@polygons[[1]]@Polygons[[1]]@coords[,2]),
  #              alpha = .8, fill="orange") +
  # geom_polygon(data=outline_polyi, aes(x=outline_polyi@polygons[[1]]@Polygons[[1]]@coords[,1],
  #                                     y=outline_polyi@polygons[[1]]@Polygons[[1]]@coords[,2]),
  #              alpha = .8, fill="orange") +
  # geom_polygon(data=outline_polyj, aes(x=outline_polyj@polygons[[1]]@Polygons[[1]]@coords[,1],
  #                                     y=outline_polyj@polygons[[1]]@Polygons[[1]]@coords[,2]),
  #              alpha = .8, fill="orange") +
  # geom_polygon(data=outline_polyk, aes(x=outline_polyk@polygons[[1]]@Polygons[[1]]@coords[,1],
  #                                     y=outline_polyk@polygons[[1]]@Polygons[[1]]@coords[,2]),
  #              alpha = .8, fill="orange") +
  # geom_polygon(data=outline_polyl, aes(x=outline_polyl@polygons[[1]]@Polygons[[1]]@coords[,1],
  #                                     y=outline_polyl@polygons[[1]]@Polygons[[1]]@coords[,2]),
  #              alpha = .8, fill="orange") +
  # geom_point(data = site_points, aes(x=Longitude,y=Latitude), color="black") +
  # # coord_fixed(xlim = c(-90.5, -82),  ylim = c(34.5, 37.6), ratio = 1.2) +
  # xlab("Longitude") + ylab("Latitude") + ggtitle("Eastern Highland Rim subset (gold) and study sites")

# geom_map(data = data, map = map, aes(map_id = countries, fill = color)) 
```

```{r Leaflet map, echo=FALSE, fig.width = 8.5, fig.height = 5.5}
leaflet(data=sitesizes)%>%
  setView(-85.87237, 35.78165, zoom=7)%>% 
  addTiles()%>%
      addPolygons(outline_polya, 
              lng = outline_polya@polygons[[1]]@Polygons[[1]]@coords[,1], 
              lat = outline_polya@polygons[[1]]@Polygons[[1]]@coords[,2],
              color = "#DD8D29",
              opacity= 9,
              weight = 2) %>%
      addPolygons(outline_polyb, 
              lng = outline_polyb@polygons[[1]]@Polygons[[1]]@coords[,1], 
              lat = outline_polyb@polygons[[1]]@Polygons[[1]]@coords[,2],
              color = "#DD8D29",
              opacity= 9,
              weight = 2) %>%
      addPolygons(outline_polyc, 
              lng = outline_polyc@polygons[[1]]@Polygons[[1]]@coords[,1], 
              lat = outline_polyc@polygons[[1]]@Polygons[[1]]@coords[,2],
              color = "#DD8D29",
              opacity= 9,
              weight = 2) %>%
      addPolygons(outline_polyd, 
              lng = outline_polyd@polygons[[1]]@Polygons[[1]]@coords[,1], 
              lat = outline_polyd@polygons[[1]]@Polygons[[1]]@coords[,2],
              color = "#DD8D29",
              opacity= 9,
              weight = 2) %>%
      addPolygons(outline_polye, 
              lng = outline_polye@polygons[[1]]@Polygons[[1]]@coords[,1], 
              lat = outline_polye@polygons[[1]]@Polygons[[1]]@coords[,2],
              color = "#DD8D29",
              opacity= 9,
              weight = 2) %>%
      addPolygons(outline_polyf, 
              lng = outline_polyf@polygons[[1]]@Polygons[[1]]@coords[,1], 
              lat = outline_polyf@polygons[[1]]@Polygons[[1]]@coords[,2],
              color = "#DD8D29",
              opacity= 9,
              weight = 2) %>%
      addPolygons(outline_polyg, 
              lng = outline_polyg@polygons[[1]]@Polygons[[1]]@coords[,1], 
              lat = outline_polyg@polygons[[1]]@Polygons[[1]]@coords[,2],
              color = "#DD8D29",
              opacity= 9,
              weight = 2) %>%
      addPolygons(outline_polyh, 
              lng = outline_polyh@polygons[[1]]@Polygons[[1]]@coords[,1], 
              lat = outline_polyh@polygons[[1]]@Polygons[[1]]@coords[,2],
              color = "#DD8D29",
              opacity= 9,
              weight = 2) %>%
      addPolygons(outline_polyi, 
              lng = outline_polyi@polygons[[1]]@Polygons[[1]]@coords[,1], 
              lat = outline_polyi@polygons[[1]]@Polygons[[1]]@coords[,2],
              color = "#DD8D29",
              opacity= 9,
              weight = 2) %>%
      addPolygons(outline_polyj, 
              lng = outline_polyj@polygons[[1]]@Polygons[[1]]@coords[,1], 
              lat = outline_polyj@polygons[[1]]@Polygons[[1]]@coords[,2],
              color = "#DD8D29",
              opacity= 9,
              weight = 2) %>%
      addPolygons(outline_polyk, 
              lng = outline_polyk@polygons[[1]]@Polygons[[1]]@coords[,1], 
              lat = outline_polyk@polygons[[1]]@Polygons[[1]]@coords[,2],
              color = "#DD8D29",
              opacity= 9,
              weight = 2) %>%
      addPolygons(outline_polyl, 
              lng = outline_polyl@polygons[[1]]@Polygons[[1]]@coords[,1], 
              lat = outline_polyl@polygons[[1]]@Polygons[[1]]@coords[,2],
              color = "#DD8D29",
              opacity= 9,
              weight = 2) %>%
      addCircleMarkers(data = sitesizes, lat = ~Latitude, lng = ~Longitude,
                   label = ~Site,
                   popup = ~Community,
                   opacity= 100,
                   weight = 2,
                   color= "black",
                   radius = ~4)%>%
  addProviderTiles(providers$Esri.NatGeoWorldMap, group = "NatGeo")%>%
  addProviderTiles(providers$Esri.WorldImagery, group = "ESRI") %>%
  addMiniMap(zoomLevelOffset = -4)%>%
  addScaleBar()%>%
  addLayersControl(
    baseGroups = c("NatGeo", "ESRI"),
    options = layersControlOptions(collapsed = FALSE))
```








1031 1m^2 quadrats were sampled with differeing sample sizes per site based on the size of the site. 

### Data Configuration

Nearly all statistical packages require the data to be in a presence-absence form. There are several ways to do it (one of which can maintain cover values rather than changing it to binary data); I used a loop function. The result is a presence-absence matrix with Site as a column so subsamples can be organized accordingly. 

```{r Loop magic, cache=TRUE, include=FALSE}
sites_sub<-unique(sppxsites$ID)
spp_sub<-unique(sppxsites$Species)

#Make a matrix with loop function
spp_commat <- matrix(0, length(sites_sub), length(spp_sub))
for (i in 1:nrow(spp_commat)){temp_sites <- sppxsites[which(sppxsites$ID == sites_sub[i]),]
  spp_commat[i, which(spp_sub%in%temp_sites$Species)]<- 1
  print(i)}

#Name rows and Columns
rownames(spp_commat) <- as.character(sites_sub)
colnames(spp_commat) <- as.character(spp_sub)

#Change matrix into dataframe
spp_commat.df<- as.data.frame(spp_commat)

#Make empty quadrats truly empty
spp_commat.df.fixed<-subset(spp_commat.df, select=-c(Empty))

#Add site column back in and change its name
AllPonds <- cbind(sitenames$Pond,spp_commat.df.fixed)
names(AllPonds)[names(AllPonds)=="sitenames$Pond"] <- "Site"
```

```{r Kable 1, echo=FALSE}
AllPondsTrimmed<-select(AllPonds, c(1:7))
kable(AllPondsTrimmed[1:5, ], format = "pandoc", full_width=F, caption = 'Table 1. Sample of data in presence-absence format.')
```


## Diversity

```{r Split sites, include=FALSE}
#Break up AllPonds into dataframes by site
#The first column has to be removed for each
#Transpose the dataframes so species are columns

S01<-t(select(filter(AllPonds, Site == 1), -1))
S02<-t(select(filter(AllPonds, Site == 2), -1))
S03<-t(select(filter(AllPonds, Site == 3), -1))
S04<-t(select(filter(AllPonds, Site == 4), -1))
S05<-t(select(filter(AllPonds, Site == 5), -1))
S06<-t(select(filter(AllPonds, Site == 6), -1))
S07<-t(select(filter(AllPonds, Site == 7), -1))
S08<-t(select(filter(AllPonds, Site == 8), -1))
S09<-t(select(filter(AllPonds, Site == 9), -1))
S10<-t(select(filter(AllPonds, Site == 10), -1))
S11<-t(select(filter(AllPonds, Site == 11), -1))
S12<-t(select(filter(AllPonds, Site == 12), -1))
S13<-t(select(filter(AllPonds, Site == 13), -1))
S14<-t(select(filter(AllPonds, Site == 14), -1))
S16<-t(select(filter(AllPonds, Site == 16), -1))
S17<-t(select(filter(AllPonds, Site == 17), -1))
S18<-t(select(filter(AllPonds, Site == 18), -1))
S19<-t(select(filter(AllPonds, Site == 19), -1))
S20<-t(select(filter(AllPonds, Site == 20), -1))
S21<-t(select(filter(AllPonds, Site == 21), -1))
S22<-t(select(filter(AllPonds, Site == 22), -1))
S23<-t(select(filter(AllPonds, Site == 23), -1))
S24<-t(select(filter(AllPonds, Site == 24), -1))
S25<-t(select(filter(AllPonds, Site == 25), -1))
S26<-t(select(filter(AllPonds, Site == 26), -1))
S27<-t(select(filter(AllPonds, Site == 27), -1))
S28<-t(select(filter(AllPonds, Site == 28), -1))
```

```{r Unicorn vomit prep, fig.height=6, fig.width=10, message=FALSE, warning=FALSE, cache=TRUE, include=FALSE}
#Make a list of all site dataframes
site.list.all = list(S01=S01,S02=S02,S03=S03,S04=S04,S05=S05,S06=S06,S07=S07,S08=S08,S09=S09,S10=S10,S11=S11,S12=S12,S13=S13,S14=S14,S16=S16,S17=S17,S18=S18,S19=S19,S20=S20,S21=S21,S22=S22,S23=S23,S24=S24,S25=S25,S26=S26,S27=S27,S28=S28)
#Convert everything in list to incidence frequencies
site.list.freq.all = lapply(site.list.all, as.incfreq)
```

```{r Unicorn vomit prep2, fig.height=6, fig.width=10, message=FALSE, warning=FALSE, cache=TRUE, include=FALSE}
out.inc.all<-iNEXT(site.list.freq.all, q=0, datatype="incidence_freq", nboot=10000)
```

```{r Kable 2, include=FALSE}
#The iNEXT output ("out.inc.all") contains three tables; "AsyEst" has all the data we need
#Saved it as an object and changed it so it would be more readable
RichDiv<-out.inc.all$AsyEst
RichDivT<-subset(RichDiv, select=-c(s.e., LCL, UCL))
RichDivObs<-subset(spread(RichDivT, Diversity, Observed), select=-c(Estimator))
RichDivEst<-subset(spread(RichDivT, Diversity, Estimator), select=-c(Observed))

#Trim down column names so they don't suck
names(RichDivObs)<-c("Site", "RichnessObs", "ShannonObs", "SimpsonObs")
names(RichDivEst)<-c("Site", "RichnessExp", "ShannonExp", "SimpsonExp")

#Collapse sites; have to do funky things to deal with NAs
compress <- function(x) c(na.omit(x), NA)[1]
RichDivObs.1<-aggregate(RichDivObs[2:4], RichDivObs[1], compress)
RichDivEst.1<-aggregate(RichDivEst[2:4], RichDivEst[1], compress)

#Remove site column from RichDivEst so it's not duplicated when merging
RichDivEst.2<-subset(RichDivEst.1, select=-c(Site))

#Merge everybody together
RichDivFinal<-cbind(RichDivObs.1, RichDivEst.2)

#Reorder columns so they make sense
RichDivTab<-RichDivFinal[,c(1,2,5,3,6,4,7)]
RichDivKab<-RichDivTab[c(1, 6, 7, 8, 10, 26, 27),]

kable(RichDivKab[1:7,], format = "pandoc", full_width=F, caption = 'Table 2. Observed and expected Hill numbers from a portion of the sites.')
```

```{r Est and Obs Richness, echo=FALSE, fig.height=4, fig.width=10, message=FALSE, warning=FALSE}
RichnessOnly<-filter(RichDiv, Diversity == "Species richness")
colnames(RichnessOnly)[colnames(RichnessOnly)=="LCL"] <- "RichLCL"
colnames(RichnessOnly)[colnames(RichnessOnly)=="UCL"] <- "RichUCL"
colnames(RichnessOnly)[colnames(RichnessOnly)=="Estimator"] <- "Estimated"
RichGath<-gather(RichnessOnly,Richness,value,Observed:Estimated)
RichSize<- cbind(RichGath, sitesizes$Quadrats)
colnames(RichSize)[colnames(RichSize)=="sitesizes$Quadrats"] <- "Quadrats"


ggplot(RichGath, aes(x=Site, y=value, color=Richness)) +
  scale_color_manual(values=c("darkgrey", "black"))+
  geom_errorbar(aes(ymin=RichLCL, ymax=RichUCL), width=.3, color="darkgrey") +
  geom_point(size=2)+
  ylab("Species Richness")+
  coord_cartesian(ylim = c(0, 250))+
  ggtitle("Estimated and observed species richness by site, with 95% confidence intervals.")+
  labs(caption="Note that upper confidence interval for Site 26 extends to 262.8 and Site 10 extends to 408.52.")+
  theme(axis.title=element_text(size=14), plot.title = element_text(size=14))
```

Sites had widely varying observed total richness. The use of extrapolated species accumulation curves can tell us how many species are likely to be at the site based on how many were found in accumulating subsamples. However if a curve never flattens, it gives a wild estimate of richness (see Site 10).

What do the curves that made these estimates look like? Let's take a look!

```{r Unicorn vomit, echo=FALSE, fig.height=6, fig.width=10, message=FALSE, warning=FALSE, cache=TRUE}
ggiNEXT(out.inc.all, type=1, color.var="site") + 
  theme_bw(base_size = 18) +
  ylab("Species Richness") +
  xlab("Number of Quadrats") +
  ggtitle("Interpolated and extrapolated richness") +
  labs(caption="B=10000.")
```

This plot is not particularly helpful other than to visualize the general span of observed and expected richnesses and sampling efforts. Examining the curves in portions of 3-4 is necessary to 

```{r Smaller curves 1, include=FALSE}
#List the dataframes, then convert them into incidence frequencies
#I broke them into manageable chunks that graph more clearly than everything at once

site.list.1 = list(S03=S03, S08=S08,  S09=S09, S28=S28)
site.list.freq.1 = lapply(site.list.1, as.incfreq)

site.list.2 = list(S01=S01, S02=S02,  S11=S11, S27=S27)
site.list.freq.2 = lapply(site.list.2, as.incfreq)

site.list.3 = list(S10=S10, S13=S13, S14=S14, S23=S23)
site.list.freq.3 = lapply(site.list.3, as.incfreq)


site.list.4 = list(S05=S05,S12=S12, S17=S17)
site.list.freq.4 = lapply(site.list.4, as.incfreq)

site.list.5 = list( S19=S19, S20=S20, S21=S21, S25=S25)
site.list.freq.5 = lapply(site.list.5, as.incfreq)

site.list.6 = list(S06=S06, S16=S16, S22=S22, S26=S26)
site.list.freq.6 = lapply(site.list.6, as.incfreq)

site.list.7 = list(S04=S04, S07=S07, S18=S18, S24=S24)
site.list.freq.7 = lapply(site.list.7, as.incfreq)
```

```{r Smaller curves 2, echo=FALSE, fig.height=4, fig.width=7, message=FALSE, warning=FALSE, cache=TRUE}
#Richness (q=0), Shannon Div (q=1), Simpson Div (q=2)
#Use grey=TRUE to make all lines black
out.inc1<-iNEXT(site.list.freq.1, q=0, datatype="incidence_freq", size=NULL, nboot=100)
A<-ggiNEXT(out.inc1, type=1, color.var="site") + 
  theme_bw(base_size = 18) +
  ylab("Species Richness") +
  xlab("Number of Quadrats") +
  labs(caption="B=100.")

out.inc2<-iNEXT(site.list.freq.2, q=0, datatype="incidence_freq", size=NULL, nboot=100)
B<-ggiNEXT(out.inc2, type=1, color.var="site") + 
  theme_bw(base_size = 18) +
  ylab("Species Richness") +
  xlab("Number of Quadrats") +
  labs(caption="B=100.")

out.inc3<-iNEXT(site.list.freq.3, q=0, datatype="incidence_freq", size=NULL, nboot=100)
C<-ggiNEXT(out.inc3, type=1, color.var="site") + 
  theme_bw(base_size = 18) +
  ylab("Species Richness") +
  xlab("Number of Quadrats") +
  labs(caption="B=100.")

out.inc4<-iNEXT(site.list.freq.4, q=0, datatype="incidence_freq", size=NULL, nboot=100)
D<-ggiNEXT(out.inc4, type=1, color.var="site") + 
  theme_bw(base_size = 18) +
  ylab("Species Richness") +
  xlab("Number of Quadrats") +
  labs(caption="B=100.")

out.inc5<-iNEXT(site.list.freq.5, q=0, datatype="incidence_freq", size=NULL, nboot=100)
E<-ggiNEXT(out.inc5, type=1, color.var="site") + 
  theme_bw(base_size = 18) +
  ylab("Species Richness") +
  xlab("Number of Quadrats") +
  labs(caption="B=100.")

out.inc6<-iNEXT(site.list.freq.6, q=0, datatype="incidence_freq", size=NULL, nboot=100)
F<-ggiNEXT(out.inc6, type=1, color.var="site") + 
  theme_bw(base_size = 18) +
  ylab("Species Richness") +
  xlab("Number of Quadrats") +
  labs(caption="B=100.")

out.inc7<-iNEXT(site.list.freq.7, q=0, datatype="incidence_freq", size=NULL, nboot=100)
G<-ggiNEXT(out.inc7, type=1, color.var="site") + 
  theme_bw(base_size = 18) +
  ylab("Species Richness") +
  xlab("Number of Quadrats") +
  labs(caption="B=100.")
```

```{r Many curves output, fig.height=25, fig.width=7, include=FALSE}
ggarrange(A, B, C, D, E, F, G, nrow=7)
```
 
```{r Data rearranging, echo=FALSE, fig.height=4, fig.width=10, message=FALSE, warning=FALSE}
Intervals<-select(slice(RichGath, c(28:54)), c(RichLCL, RichUCL))
RichInts<-cbind(RichDivTab, Intervals)
RichInts$PercComplete <- RichInts$RichnessObs/RichInts$RichnessExp*100
RichInts$PercCompleteU<- RichInts$RichnessObs/RichInts$RichUCL*100
RichInts$PercCompleteL<- RichInts$RichnessObs/RichInts$RichLCL*100
sitesizes$QuadsArea<- sitesizes$Quadrats/(pi*(sitesizes$Length^2))
sitesizes$Area<-(pi*(sitesizes$Length^2))
# 
# THIS PLOT IS TRASH
# ggplot(RichInts, aes(x=Site, y=PercComplete)) +
#   geom_bar(aes(y=sitesizes$Quadrats,x=Site), stat="identity", width = 0.75, fill="lightgrey")+
#   geom_point(size=2)+
#   geom_errorbar(aes(ymin=PercCompleteL, ymax=PercCompleteU), width=.3) +
#   ylab("Observed/Expected Richness * 100")+
#   scale_y_continuous(sec.axis = sec_axis(~., name = "Quadrats Sampled"))+
#   ggtitle("Completion percentage and sample size per site")+
#   theme(axis.title=element_text(size=14), plot.title = element_text(size=14))
```

```{r LM Sampling sufficiency, include=FALSE}
lmRichQuad <- lm(RichInts$PercComplete ~ sitesizes$QuadsArea)
autoplot(lmRichQuad)
aRichQuad<-anova(lmRichQuad)
aRichQuad
```

These curves illustrate not only *where* the flattening point (expected richness) occurs, but also *how quickly*. Examining a curve can allow someone to estimate how many more samples would be needed to reach that point, however if doing so samples a larger area then the curve may never flatten. 

Let's see if sampling effort (# quadrats/area sampled) affected percent estimated sampling completion; if it did, that would be a big problem and I would have a lot of explaining to do to my committee.

```{r Sampling sufficiency, echo=FALSE}
ggplot(RichInts, aes(x=sitesizes$QuadsArea, y=PercComplete, color="black")) +
  geom_smooth(method="lm", color="black")+
  geom_text(label=RichInts$Site, color="black", size=3)+
  ylab("Estimated % Species Sampled")+
  xlab(expression(Quadrats~Sampled/Site~Area~(m^2)))+
  ggtitle("Estimated sampling completion does not increase with sampling effort")+
  coord_cartesian(ylim = c(0, 100))+
  theme(axis.title=element_text(size=14), plot.title = element_text(size=14))
```

There is no relationship between sampling effort and completion percentage (p=`r aRichQuad[1,5]`). However, note that Sites 10 and 26 were flagged as outliers by the autoplot function. This inadequate sampling is likely the result of too few quadrats sampled at the wetland edge relative to the size of the wetland. 



## Evenness and Dominance

<figure>
<img src="Images/Goosecropped.jpg"></figure><br>

```{r Site size and richness, echo=FALSE, message=FALSE, warning=FALSE}
quaddies<-merge(sitesizes, RichInts, by="Site")
quaddies$Latitude <- as.numeric(as.character(quaddies$Latitude))

ggplot(RichInts, aes(x=sitesizes$Area, y=RichInts$RichnessObs)) +
  geom_point(size=2)+
  ylab("Site richness")+
  xlab(expression(Site~area~(m^2)))+
  ggtitle("Site area does not have a positive affect on site richness")+
  theme(axis.title=element_text(size=14), plot.title = element_text(size=14))
```

There appears to be a relationship between site richness and site area, but unexpectedly this relationship appears to be negative. Because the data are likely non-linear, a generalized linear model should be used to assess this relationship.




## Communities

```{r Sorensen time, include=FALSE}
sum01<-rowSums(S01)
sum02<-rowSums(S02)
sum03<-rowSums(S03)
sum04<-rowSums(S04)
sum05<-rowSums(S05)
sum06<-rowSums(S06)
sum07<-rowSums(S07)
sum08<-rowSums(S08)
sum09<-rowSums(S09)
sum10<-rowSums(S10)
sum11<-rowSums(S11)
sum12<-rowSums(S12)
sum13<-rowSums(S13)
sum14<-rowSums(S14)
sum16<-rowSums(S16)
sum17<-rowSums(S17)
sum18<-rowSums(S18)
sum19<-rowSums(S19)
sum20<-rowSums(S20)
sum21<-rowSums(S21)
sum22<-rowSums(S22)
sum23<-rowSums(S23)
sum24<-rowSums(S24)
sum25<-rowSums(S25)
sum26<-rowSums(S26)
sum27<-rowSums(S27)
sum28<-rowSums(S28)

summary<-as.data.frame(rbind(sum01, sum02, sum03, sum04, sum05, sum06, sum07, sum08, sum09, sum10, sum11, sum12, sum13, sum14, sum16, sum17, sum18, sum19, sum20, sum21, sum22, sum23, sum24, sum25, sum26, sum27, sum28))
sums<-colSums(summary)
#TO DELETE SINGLETONS
sumsums<-rbind(sums, summary)
rownames(sumsums)[1]<-"sum"
sumsums1 = sumsums[,sums > 1]
ncol(sumsums[,sums == 1])
#235** singletons!

#Convert to binary
sumsums1[sumsums1 > 0] <- 1

sumsums2<-slice(sumsums1, -c(1))

#0=identical, 1=the most dissimilar
sorensen<-vegdist(sumsums2, method="bray", binary=TRUE, diag=TRUE, upper=FALSE, na.rm = FALSE)
sorensen.m<-as.matrix(sorensen)
sorensen.m.t<-signif(sorensen.m, digits = 3)
```

```{r Distance Matrix, include=FALSE}
sitesizesgeo<-select(sitesizes, c("Site", "Latitude", "Longitude"))
sitesizesgeo.1<-sitesizesgeo[,c(1,3,2)]
sitesizesgeo.2<-sitesizesgeo.1[,c(2,3)]
distance.m<-distm(sitesizesgeo.2)
```

```{r echo=FALSE, message=FALSE, warning=FALSE}
#This is really sloppy
sorensencol<-as.data.frame(sorensen.m.t)
distancecol<-as.data.frame(distance.m)
sorensenmelt<-melt(sorensencol, variable.name = 'Sorensen')
geographicmelt<-melt(distancecol, variable.name = 'Geographic')
distance.1<-cbind(geographicmelt, sorensenmelt)
names(distance.1)<-c("Var1", "geographic", "Var2", "Sorensen")
#Remove zero's because they occur whwere sites match
distance.f<-filter(distance.1, geographic > 0)
```

```{r NMDS prep, include=FALSE}
sorensen.nmds01 <- metaMDS(sorensen, k=10)
sorstressplot<-stressplot(sorensen.nmds01)

```


```{r NMDS, echo=FALSE, message=FALSE, warning=FALSE}
cluster<-hclust(sorensen, method="average")
grp<-cutree(cluster,6)
plot(sorensen.nmds01, type="text")
ordicluster(sorensen.nmds01, cluster, prune = 7, display = "sites",
         w = weights(sorensen.nmds01, display))
# ordiellipse(sorensen.nmds01, group=grp, display="sites", kind=c("sd"), draw="lines", conf=0.95, lwd=2.6, col="grey44")
```

### Which sites are similar?

There may be a lot of overlap in clusters due to the nestedness of some community types. We should revisit this later using quadrats as replicates.

### Are sites similar because they're geographically closer?

```{r echo=FALSE, message=FALSE, warning=FALSE}
ggplot(distance.f, aes(x=distance.f$geographic, y=distance.f$Sorensen)) +
  geom_point()+
  geom_smooth(method="glm", method.args=list(family="binomial"(link="logit")))+
  ylab("Sorensen dissimilarity index")+
  xlab("Geographic distance (m)")+
  ggtitle("Geographic distance between sites has no effect on species similarity")+
  coord_cartesian(ylim = c(0, 1))
```


```{r echo=FALSE, message=FALSE, warning=FALSE}
distance.glm<-glm(Sorensen~geographic, family=binomial, data=distance.f)
summary(distance.glm)
```

Ordinations

## Implications

Plant communities aren't real. We can all go home now.
